DIPS - 11/04/2024
Traditional educational research often fixates on average academic achievement.
Average performance does not provide information on the consistency of academic achievement over time or within a cluster.
Consistent performance is positively correlated with student motivation and predicts more favorable long-term educational outcomes.
High variability can result in disproportionate representation of certain demographics at both ends of the achievement spectrum
Propose a methodology to identify clustering units (students, classrooms, etc.) that exhibit either unusually large or unusually small within-cluster variance.
We present an adaptation of the Mixed effects location scale model (MELSM) by means of shrinking random effects to their fixed effect using the Spike and Slab regularization
MELSM allows for the simultaneous estimation of a model for the means (location) and a model for the residual variance (scale).
Both sub-models are conceptualized as mixed-effect models and can accommodate specific predictors.
\[\begin{equation} \textbf{v}= \begin{bmatrix} u_0 \\ t_0 \end{bmatrix} \sim \mathcal{N} \begin{pmatrix} \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_0} & \tau_{u_0t_0} \\ \tau_{u_0t_0} & \tau^2_{t_0} \end{bmatrix} \end{pmatrix} \end{equation}\]
Examining the nature of this variability can yield insights that go beyond the mean achievement levels.
Accounts for possible correlations among location and scale effects.
We employ incorporate the Spike-and-Slab prior as a method of variable selection of random effects in the scale model.
The model is allowed to switch between two assumptions:
First, \(\boldsymbol{\Sigma}\) can be decomposed into \(\boldsymbol\Sigma = \boldsymbol{\tau}\boldsymbol{\Omega\tau}'\) to specify independent priors for each element of \(\boldsymbol{\tau}\) and \(\boldsymbol{\Omega}\), where
\[\begin{equation} \label{eq:cholesky_approach} \textbf{L} = \begin{pmatrix} 1 & 0 \\ \rho_{u_0t_0} & \sqrt{1 - \rho_{u_0t_0}^2} \end{pmatrix} \end{equation}\]
\[\begin{equation} \textbf{v} = \boldsymbol{\tau}\textbf{L}\boldsymbol{z}. \end{equation}\]
Now, we include an indicator variable (\(\delta_{jk}\)) for each random effect to be subjected to shrinkage.
It allows switching between the spike and slab throughout the MCMC sampling process.
\[\begin{equation} \begin{aligned} u_{0j} &= \tau_{u_0}z_{ju_0}\\ t_{0j} &= \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{aligned} \end{equation}\]
The subscript to the indicator (\(\boldsymbol{\delta}_j\)) assigns each school a prior inclusion probability of the random effect \(k\).
Each element in \(\boldsymbol{\delta}_j\) takes integers \(\in \{0,1\}\) and follows a \(\delta_{jk} \sim \text{Bernoulli}(\pi)\) distribution.
When a 0 is sampled, the portion after the fixed effect drops out of the equation.
\[\begin{equation} \label{eq:mm_delta} \sigma_{\varepsilon_{ij}} = \begin{cases} \exp(\eta_0), & \text{if }\delta_{jt_0} = 0 , \\ \exp(\eta_0 + t_{0j}), & \text{if }\delta_{jt_0} = 1 \end{cases}. \end{equation}\]
\[\begin{align} \label{eq:pip_theorical} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{Pr(\textbf{Y} | \delta_{jk} = 1)Pr(\delta_{jk} = 1)}{Pr(\textbf{Y})}, \end{align}\]
\[\begin{align} \label{eq:pip} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{1}{S} \sum_{s = 1}^S \delta_{jks}, \end{align}\]
\[\begin{equation} \underbrace{\frac{Pr(\delta_{jk} = 1 | \textbf{Y})}{Pr(\delta_{jk} = 0 | \textbf{Y})}}_{\text{Posterior Odds}} = \underbrace{\frac{Pr(\delta_{jk} = 1)}{Pr(\delta_{jk} = 0)}}_{\text{Prior Odds}} \times \underbrace{\frac{Pr(\textbf{Y} | \delta_{jk} = 1)}{Pr(\textbf{Y} | \delta_{jk} = 0)}}_{\text{Bayes Factor}} \nonumber. \end{equation}\]
\[\begin{align} \label{eq:bf_pip} BF_{10j} = \frac{Pr(\delta_{jk} = 1 | \textbf{Y}) }{1 - Pr(\delta_{jk} = 1 | \textbf{Y}) }. \end{align}\]
We’re using a subset of data from the 2021 Brazilian Evaluation System of Elementary Education (Saeb).
It focuses on math scores from 11th and 12th-grade students across 160 randomly selected schools, encompassing a total of 11,386 students.
The analysis compares three SS-MELSM models with varying levels of complexity:
The model was fitted using ivd package in R (Rast & Carmo, 2017).
All models were fitted with six chains of 3,000 iterations and 12,000 warm-up samples
We computed the estimation efficiency using \(\hat{R}\) and the effective sample size (ESS).
The models were compared for predictive accuracy using Pareto smoothed importance sampling Leave-one-out cross-validation (PSIS-LOO).
Model 1 identified eight schools with PIPs exceeding 0.75, suggesting notable deviations from the average within-school variance.
By incorporating SES covariates, Model 2 significantly outperformed Model 1 in terms of predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}\) = -43.6 (10.5).
Model 3 was practically indistinguishable from Model 2; the inclusion of a random slope for the student-level SES did not improve the model’s predictive accuracy.
The SS-MELSM offers an approach for identifying schools deviating from the norm in terms of within-school variability.
The spike-and-slab prior accounts for uncertainty in including random effects.
Investigating inconsistent schools might reveal variations in teaching quality, student engagement levels, or the impact of external influences on specific schools.
Currently, the SS-MELSM demands a lot of computational resources, especially with bigger datasets or more complex models.
It is still not clear how model performance is affected by the choice of hyperparameters.
Further development could explore the method’s performance in longitudinal data settings.
Beyond Averages with MELSM and Spike-and-Slab